NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

An SMT Formalization of Mixed-Precision Matrix Multiplication: Modeling Three Generations of Tensor Cores

Valpey, Benjamin; Li, Xinyi; Pai, Sreepathi; Gopalakrishnan, Ganesh (June 2025, NASA Formal Methods)
Titolo, Laura (Ed.)
Many recent computational accelerators provide non-standard (e.g., reduced precision) arithmetic operations to enhance performance for floating-point matrix multiplication. Unfortunately, the properties of these accelerators are not widely understood and lack sufficient descriptions of their behavior. This makes it difficult for tool builders beyond the original vendor to target or simulate the hardware correctly, or for algorithm designers to be confident in their code. To address these gaps, prior studies have probed the behavior of these units with manually crafted tests. Such tests are cumbersome to design, and adapting them as the accelerators evolve requires repeated manual effort. We present a formal model for the tensor cores of NVIDIA’s Volta, Turing, and Ampere GPUs. We identify specific properties—rounding mode, precision, and accumulation order—that drive these cores’ behavior. We formalize these properties and then use the formalization to automatically generate discriminating inputs that illustrate differences among machines. Our results confirm many of the findings of previous tensor core studies, but also identify subtle disagreements. In particular, NVIDIA’s machines do not, as previously reported, use round-to-zero for accumulation, and their 5-term accumulator requires 3 extra carry-out bits for full accuracy. Using our formal model, we analyze two existing algorithms that use half-precision tensor cores to accelerate single-precision multiplication with error correction. Our analysis reveals that the newer algorithm, designed to be more accurate than the first, is actually less accurate for certain inputs.
more » « less
Free, publicly-accessible full text available June 12, 2026
Vellvm: Formalizing the Informal LLVM: (Experience Report)

https://doi.org/10.1007/978-3-031-93706-4_6

Beck, Calvin; Chen, Hanxi; Zdancewic, Steve (June 2025, LNCS)
Dutle, Aaron; Humphrey, Laura; Titolo, Laura (Ed.)
Free, publicly-accessible full text available June 8, 2026
Modeling code manipulation in JIT compilers

https://doi.org/10.1145/3520313.3534656

Lim, HeuiChan; Kang, Xiyu; Debray, Saumya (June 2022, Proceedings of the 11th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis)
Gonnord, Laure; Titolo, Laura (Ed.)
Just-in-Time (JIT) compilers are widely used to improve the performance of interpreter-based language implementations by creating optimized code at runtime. However, bugs in the JIT compiler’s code manipulation and optimization can result in the generation of incorrect code. Such bugs can be difficult to diagnose and fix, and can result in exploitable vulnerabilities. Unfortunately, existing approaches to automatic bug localization do not carry over well to such bugs. This paper discusses a different approach to analyzing JIT compiler optimization behaviors, based on using dynamic analysis to construct abstract models of the JIT compiler’s optimizer and back end. By comparing the models obtained for buggy and non-buggy executions of the JIT compiler, we can pinpoint the components of the JIT compiler’s internal representation that have been affected by the bug; this can then be mapped back to identify the buggy code. Our ex- periments with two real bugs for Google V8 JIT compiler, TurboFan, show the utility and practicality of our approach.
more » « less
Full Text Available

Search for: All records